TECHNICAL  REPORT  NO.  21 
T.  J.  DICICCIO 
NOVEMBER  1987 


U.  S.  ARMY  RESEARCH  OFFICE 
CONTRACT  DAAG29-85-K-0239 
THEODORE  W.  ANDERSON,  PROJECT  DIRECTOR 

DEPARTMENT  OF  STATISTICS 
STANFORD  UNIVERSITY 
STANFORD,  CALIFORNIA 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED. 


LIKELIHOOD  INFERENCE  FOR 
LINEAR  REGRESSION  MODELS 


T.  J.  DiCiccio 
Stanford  University 


TECHNICAL  REPORT  NO.  21 
November  1987 


U.  S.  Army  Research  Office 
Contract  DAAG29-85-K-0239 
Theodore  W.  Anderson,  Project  Director 


Department  of  Statistics 
Stanford  University 
Stanford,  California 


Approved  for  Public  Release;  Distribution  Unlimited. 


1.  Introduction 


Consider  observed  random  variables  Y, , . .  .  ,Y  which  follow  a 

1  n 

linear  regression  model;  that  is,  let  Y^  =  +  aE^  ,  where 

yi  =  .^iejXij  (1=1 . n) 

and  E^,...,En  are  independent  and  identically  distributed  with  known 

T 

probability  density  function  f  .  The  vectors  x.  =  (x. x.  ) 

i  ll  xp 

(i=l,...,n)  of  covariate  values  are  assumed  to  be  known,  and  the  vector 
3  =  (8, , ,3  )  of  regression  coefficients  is  to  be  estimated.  Based 

1  p 

on  the  sample  Y  =  (Y^,...,Yn)  ,  the  log  likelihood  function  for  8 
and  0  =  log  a  is 

n  -A 

1(3, 9;Y)  =  -n0  +  ][  log  f{e  (Y .  -0x.)}  . 

i-1  1  1 

Several  authors,  including  Fisher  (1934),  Fraser  (1979),  and  Ver- 
hagen  (1961) ,  have  argued  that  inferences  concerning  the  parameters  8 
and  a  should  be  made  conditionally  on  the  sample  configuration 

A 

A  =  (A^,...,An)  consisting  of  the  standardized  residuals  A^  =  (Y.j-Bx^) 
/a  (i=l,...,n)  ,  where  8  and  8  are  the  maximum  likelihood  estima¬ 
tors  of  8  and  a  .  The  joint  conditional  density  of  the  pivotal  sta¬ 
tistics  =  (8-8)  /a  and  ?2  =  0-@  given  the  configuration  A  is 
determined  by 

fP15P2  |A(pl’p2)0C  exp{1(f^aPi»^2;Y^<r  expfKp^p^A)  >  •  (1) 
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If  the  scale  parameter  is  known  and  has  the  value  1  ,  then  a  similar 
expression  obtains.  In  that  case,  the  conditional  density  of  P=8-§ 
given  the  configuration  A=  (A^,...,An)  of  residuals  A^=Y.£- 

(i=l,...,n)  satisfies 


fp|A(p)oc  exp{l(§+p;Y)}a  exp{l(p;A) }  , 


(2) 


where  l(g;Y)  is  the  log  likelihood  function  for  8  based  on  Y  . 

The  use  of  (1)  or  (2)  for  exact  conditional  inference  usually 
requires  numerical  integration  over  several  dimensions,  which  can  be 
cumbersome.  Accurate  approximate  methods  for  inference  that  require 
less  computation  are  therefore  of  interest,  and  log  likelihood  ratio 
statistics  are  a  possible  basis  for  such  methods. 

Suppose  that  q  of  the  parameters  8^,..., 8^,8  are  of  interest. 
The  number  of  nuisance  parameters  is  p-q+1  or  p-q,  depending  on 
whether  the  scale  parameter  is  unknown  or  not .  To  simplify  the  nota¬ 
tion,  let  (J>  =  (co^,  . .  .  ,co^)  consist  of  the  parameters  of  interest,  let 
ip  consist  of  the  nuisance  parameters,  and  let  l(co)  be  the  log  like¬ 
lihood  function  for  co  =  based  on  Y  .  Fixing  the  value  of  <p  > 

the  log  likelihood  ratio  statistic  for  that  parameter  is 


W(<J>)  =  2{l(co)  -  1(4>,^)}  , 


where  0)  is  the  unconstrained  maximum  likelihood  estimator  of  co  and 


is  the  restricted  maximum  likelihood  estimator  of  co  corres¬ 
ponding  to  the  specified  value  of  cj>  .  For  the  case  q=l  in  which  a 


2 


scalar  parameter  is  of  interest,  the  signed  square  root  of  the  log  like¬ 
lihood  ratio  statistic  for  cp  =  is  defined  by 


RCa^)  =  sgn(co1~w1){W(w1)} 


1/2 


It  is  shown  in  the  usual  large-sample  maximum  likelihood  theory 

that  the  marginal  distributions  of  W  and  R  tend  under  mild  conditions 

2 

to  the  chi-squared  distribution  with  q  degrees  of  freedom  and  the 

standard  normal  distribution  N(0,1)  respectively,  as  the  sample  size 

n  increases.  Hinkley  (1978)  has  shown  that  these  limits  also  hold  for 

the  conditional  distributions  of  W  and  R  . 

The  standard  normal  approximation  to  the  conditional  distribution 

-1/2 

of  R  has  error  of  order  Op(n  )  »  and  the  approximate  confidence 

limits  thus  obtained  are  correct  to  that  order.  The  error  of  the 

N(0,1)  approximation  can  be  reduced  to  order  0p(n  "S  by  taking  the 

conditional  mean  of  R  into  account,  and  this  error  can  be  further 
-3/2 

reduced  to  0^(n  )  by  adjusting  for  both  the  conditional  mean  and 

variance  of  R  .  Formulae  for  these  mean  and  variance  adjustments  are 
given  in  section  2.  Such  corrections  to  the  signed  roots  of  log  like¬ 
lihood  ratio  statistics  for  scalar  parameters  have  been  discussed  very 
generally  by  Bamdorf f-Nielsen  (1986)  and  in  the  case  of  location-scale 
models  by  DiCiccio  (1987)  . 

The  chi-squared  approximation  to  the  conditional  distribution  of 

W  has  error  of  order  Op(n  »  and  this  error  can  be  reduced  to  order 

-3 /2 

0  (n  )  by  the  use  of  a  scaling  factor  which  accounts  for  the 
P 
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conditional  mean  of  W  .  A  formula  for  this  adjustment  factor  is 
given  in  section  3.  Such  corrections  for  log  likelihood  ratio  statis¬ 
tics,  often  referred  to  as  Bartlett  adjustments,  have  been  discussed 
very  generally  by  Barndorff -Nielsen  and  Cox  (1984)  . 

2.  Approximate  Inference  for  a  Scalar  Parameter 
Adopting  the  notation  of  Sprott  (1980) ,  let 

I  =  [-321(co)/3<o  3o).  ]  -  ,  I  =  [331(w)/3o)  3o3,  3w  1  ~  , 

ab  a  b  6f=GJ  abc  a  b  c  co=co 

I  ,  ,  =  [3  1(cu)/3oj  303,30}  3o},]  a 

abed  v  '  abc  dJ03=03 

Thus  1=  ((I  ,))  is  the  observed  information  matrix  for  03,  and 
ab 

-1  ab 

I  =  ((I  ))  is  its  inverse.  In  the  expressions  that  follow,  it  is 

convenient  to  make  use  of  the  notation  convention  for  which  summation 
over  every  index  appearing  both  as  a  subscript  and  a  superscript  is 
understood.  The  range  of  summation  is  from  1  to  p  +  1  if  the  scale 
parameter  is  unknown  and  from  1  to  p  otherwise. 

Sprott  (1980)  has  shown  that  the  Taylor  expansion  of  W(g}^) 

A 

about  03^  is 

W(031)  =  U2  AU3  ~Y2  BU4  +  Op(n~3/2)  ,  (3) 

where 


4 


u 


B 


(<B1-«1)/(I11)1/2  ,  A ■  IabclalIblICl/(I11)3/2  , 
(1abcdiallbllClldl  +  3SaSbJab)/(in)2  , 


S  =1  IblIcl  ,  Jab=  Xab  -  IalIbl/IH 
a  abc  ’ 


-1/2  -1 

Note  that  A  is  0  (n  )  and  B  is  0  (n  )  .  It  follows  that 

P  P 


R(w1 )  has  the  expansion 


RCWj)  =  U-|  AU2  --y  (3B  +  A2)U3+  0p(n  3/2)  . 


(4) 


Using  calculations  similar  to  those  described  by  Hinkley  (1978) , 

expressions  (1)  and  (2)  allow  the  conditional  cumulants  of  R  to  be 

-3/2 

determined  with  error  of  order  0p(n  )  .  Ignoring  terms  of  that 
order,  the  conditional  mean  and  variance  of  R  are 


m  =  -|  Iat)CialIbC/ 1/2  “-g-  IabcIalIblICl/(I1;L)3/2 


=  |  IabcIal(2Ibc  + Jbc)/(in)1/2 


(5) 


2  ,  .1  _  ,TabTcd  TabTcdx  1  T  T  r_/TabTcdTef  _ab_cdTef. 

s  =l+y  I  ,  ,(I  I  -J  J  )+-rrc  1  t.  I,  .,13(1  I  I  -J  J  J  ) 

4  abed  12  abc  def  v  ' 


+  2«adIbeIc£  -  JadJbeJCf)}  -  m2 


and  the  higher-order  conditional  cumulants  of  R  are  zero.  Note  that 

-1/2 

m  is  of  order  0  (n  )  and  involves  second-  and  third-order 
P 


5 


2  -1 

derivatives,  whereas  s  is  l  +  O^Cn  )  and  involves  derivatives  up 
to  the  fourth  order.  Since  the  N(0,1)  approximation  to  the  condi¬ 
tional  distribution  of  R-m  has  error  of  order  0  (n  and  can  be 

P 

used  to  find  approximate  confidence  limits  correct  to  that  order,  it 

is  thus  possible  to  improve  the  accuracy  of  the  usual  large-sample 

approximation  by  taking  third-order  derivatives  into  account.  The 

-3/2 

N(0,1)  approximation  for  (R-m)/s  has  error  of  order  0  (n  )  , 

P 

but  the  improvement  in  accuracy  achieved  by  using  s  requires  fourth- 

order  derivatives  and  a  substantial  increase  in  computation. 

When  testing  a  single  hypothesized  value  of  ,  the  approximate 

procedures  based  on  R  are  not  difficult  to  implement;  however,  when 

setting  approximate  confidence  limits  for  co^  ,  these  procedures  can  be 

complicated  by  the  necessity  of  calculating  the  restricted  maximum 

likelihood  estimate  of  co  for  each  of  several  values  of  .  By 

-3/2 

inverting  expansion  (A)  and  ignoring  terms  of  order  O^Cn  )  ,  the 

/v  11  1/2 

a  quantile  of  U  =  (co^-o^)/ (I  )  is  found  to  be 


ra+  I  • 


where  r  =m  +  sz  and  z  is  the  a  quantile  of  the  N(0,1)  dis- 

CL  CL  CL 

-3/2 

tribution.  Thus,  correct  to  order  0^(n  )  ,  the  upper  a  confi¬ 
dence  limit  for  is 


S1+(I11)1/2{ra+i  Ar2+i(3B+5A2)r 2} 


(6) 
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Expression  (6)  provides  the  same  order  of  accuracy  as  the  direct  use 
of  the  N(0,1)  approximation  for  (R-m)/s  in  setting  approximate 
confidence  limits,  and  it  is  usually  much  simpler  to  apply.  However, 
in  very  small  samples,  expression  (6)  may  suffer  from  failure  of  mono¬ 
tonicity  and  produce  inaccurate  results.  In  such  cases,  the  use  of 
(R-m)/s  is  preferable. 


It  is  evident  from  (6)  that  correct  to  order  Op(n  '*’)  the  upper 
a  confidence  limit  for  is 


S1+  (I1:L)1/2{(za+m)  +j  A(za-hn)2} 


(7) 


For  general  scalar  parameter  models,  Cox  (1980)  and  McCullagh  (1984) 

have  derived  expansions  of  conditional  confidence  limits  correct  to 

order  Op(n  "*")  ,  and  Barndorff-Nielsen  (1985,  1986)  has  derived  a 

-3/2 

similar  expansion  correct  to  order  (^(n  )  •  I*1  the  present 

regression  context,  if  there  are  no  nuisance  parameters  present, 
expansion  (6)  is  equivalent  to  Bamdorff-Nielsen's  approximate  limit, 
and  (7)  is  equivalent  to  the  limits  of  Cox  and  McCullagh. 

The  methods  discussed  in  this  section  can  be  applied  for  approx¬ 
imate  inference  concerning  the  a  quantile  y^  of  an  observation 

Y  corresponding  to  a  specified  set  of  covariate  values  x=  (x^,..., 

T 

Xp)  .  This  application  is  achieved  by  formulating  the  model  so  that 
ya  appears  as  one  of  the  regression  coefficients.  For  instance,  if 
81  is  an  intercept  term  and  x^  =  . . .  =  xml  =  x1  =  1  ,  then  y„  =  8-,  + 


nl  1 


'a  "1 


8-X-  +  . . .  +  8  x  +  ere  ,  where  e„  is  the  a  quantile  of  E-  .  The 

Z  Z  p  p  Ot  CX  X 
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regression  model  can  be  written  as  ■  yQ  +  ^2  ^xi2~x2^  +  *  •  •  +  SpCk^-x^) 

+  o(Ei-e(X)  (i=l,...,n)  ,  with  ya  replacing  .  Using  this  formu¬ 
lation  of  the  model,  the  mean  and  variance  adjustments  to  the  signed 
root  of  the  log  likelihood  ratio  statistic  for  y^  can  be  derived  from 
expression  (5),  and  expressions  (6)  and  (7)  provide  approximate  confidence 
limits . 

Example  1.  In  location-scale  models,  for  which  p=l  and  U^=y 
(i=l,...,n)  ,  the  accuracy  of  the  approximate  methods  can  feasibly  be 
assessed  by  comparison  with  exact  conditional  results.  Fraser  (1979, 
p.  26)  has  presented  a  location-scale  analysis  of  Darwin’s  data  assum¬ 
ing  that  the  error  variables  have  Student's  distribution  with  X 
degrees  of  freedom.  Darwin's  data  consists  of  15  observations.  Using 
likelihood  methods,  Fraser  concluded  that  values  of  X  in  the  range 
1  to  9  are  well  supported  by  the  data.  Fraser's  analysis  includes 
the  one-sided  significance  level  for  the  hypothesis  y  =  0  and  95% 
confidence  intervals  for  y  and  a  in  each  of  the  cases  X=l,3,6,9, 
and  00  .  For  these  values  of  X  ,  the  exact  significance  levels  are 
0.041%,  0.300%,  0.765%,  1.114%,  and  2.485%  respectively;  the  levels 
obtained  from  the  N(0,1)  approximation  for  (R-m)/s  are  0.034%, 

0.310%,  0.774%,  1.102%,  and  2.437%.  Table  1  shows  the  approximate 
95%  confidence  intervals  for  y  and  a  obtained  from  (R-m)/s  and 
expression  (6) .  Each  endpoint  of  the  approximate  intervals  is  accom¬ 
panied  by  its  true  conditional  significance  level  determined  by  numeri¬ 
cal  integration.  Both  methods  give  fairly  accurate  approximations,  and 
similar  accuracy  is  obtained  for  the  approximate  intervals  of  other 
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confidence  levels.  Sprott  (1980,  1982)  has  described  a  different 
approach  to  approximate  conditional  inference  in  location -scale  models, 
and  he  also  considered  Darwin's  data. 


3.  Bartlett  Adjustments 


Suppose  that  the  q-dimensional  vector  <j>  =  (w.  is  of 

1  q 


interest,  and  let 


I  = 


I<M> 


I# 


be  the  partitioned  observed  information  matrix  for  co=  (<f>,iJ0  .  By 

A 

using  an  expansion  of  W(<j>)  about  <p  in  conjunction  with  (1)  and  (2), 

-3/2 

it  can  be  shown  that  to  error  of  order  0  (n  )  the  conditional 

P 

expectation  of  the  log  likelihood  ratio  statistic  for  <j>  is 


,  ,  1  ,  /TabTcd  ^ab^cd.  ,  1  T  T  r~ ._abTcdTef  ^ab^cd^ef . 

b4,=  q  +  4  Iabcd(I  1  “K  K  )+12IabcIdef{3(l  11  'K  K  K  > 


(8) 


+  2(iadIbelCf-KadKbeKcf)}, 


where  K=  ((Kab))  is  defined  by 


K  = 


0  0 


0  I 


-1 
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The  Xq  approximation  to  the  conditional  distribution  of  W($) 
has  error  of  order  0^(n  ,  while  the  error  in  the  approximation  for 

(b^/q)  is  of  order  O^Cn  .  Dividing  W(4>)  by  (b^/q)  pro¬ 

duces  a  quantity  whose  conditional  distribution  is  better  approximated 

by  the  x^  distribution.  In  the  case  q  =  l»  K3^  =  I3^  -  I^T^/i^l  = 

3b  2  2  2 

J  and  the  adjustment  (8)  equals  s  +m  ,  where  s  and  m  are  as 

defined  in  (5) . 

Example  2.  Consider  the  location-scale  model  of  Darwin’s  data 
described  in  Example  1,  and  let  X  =  3  .  The  approximate  90%,  95%,  and 
99%  confidence  intervals  for  y  determined  using  the  Bartlett  adjust¬ 
ment  are  (12.93,  40.34),  (9.72,  43.24),  and  (2.49,  49.47),  which  have 
true  conditional  coverage  probabilities  89.9%,  94.9%,  and  99.0%  .  The 
approximate  intervals  for  a  are  (15.38,  38.83),  (14.15,  42.78),  and 
(12.03,  52.01),  having  true  conditional  confidence  levels  89.9%,  94.9%, 
and  99.0%  .  The  use  of  the  Bartlett  adjustments  produces  very  accurate 
approximations  in  this  situation. 


4.  A  Summary  of  Derivatives 


This  section  presents  formulae  for  the  calculation  of  derivatives 
of  the  log  likelihood  function  for  8  and  0  evaluated  at  the  maximum 
likelihood  estimators.  The  second-,  third-,  and  fourth-order  deriva¬ 
tives  are 
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h  3  ~a  Exiaxib§(  ^  (Ai)  * 
a  b 


h  9  =  e  1  ZxiaAig(2)  (Ai>  ’ 
a 


l0Q  =  -n  +  ZA^g(2)(Ai)  , 


*6  3,3  Zxiaxibxicg(  ^Ai^  ’ 

a  b  c 


g  9  ExiaXib^2g  (A^+A-jg  (A±) }  , 

a  b 


S  ee  ‘  ■°'lziIiat3Aig(2)  (V  +  Aig(3)  (V 1  - 

a 


l06e  =  n-Z{3A2g(2)(Ai)  +  A3g(3)(A.)}  , 


-  ^-4  „  (4) ,  . 

1g  3  3  3  .  =  a  Zxiaxibx±cxidg  (Ai*  ’ 
abed 


*3  3,3  e  °  ZxiaXibXxc^3gV  >,(Ai)+AigV' 
a  b  c 


.(3)  (4) 


£6  s,  ee  -  a"2  Zxiaxib{4s<2>  <v  +  5Aig<3>  <At>  +  A2lSm  <A±)  }  , 

a  b 
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h  see"5'1  2xia{7AigC2)CAi)  +  6A^g(3)CAi)+A^g(4)(Ai)}  , 

a 

le00e=  -n  +  S{7A2g(2)(Ai)  +6A3g(3)(A1)  +A4g(4)(Ai)}  , 

where 


L(S  -0  l«,e;Y)/3ea3|Sbl(M).(M) 

a  d 


Q  ts  xce,Q;Y> /36a3e] (B>e>=(i, e)  » 

a 

etc.  (a,b,c,d =  1, . . . ,p)  ,  g =  log  f  ,  and  each  sum  is  taken  over 
x  “  1}..«9  n  • 
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Table  1.  Approximate  conditional  95%  confidence  intervals 


for  y 

and  a  from 

Darwin's  data 

y 

a 

A 

(R-m) / s 

(6) 

(R-m) / s 

(6) 

1 

13.16,41.08 

12.62,41.70 

8.11,34.06 

8.12,33.88 

(2.72,2.24) 

(2.28,1.92) 

(2.65,2.54) 

(2.66,2.63) 

3 

9.43,42.95 

(2.48,2.54) 

9.83,43.14 

(2.71,2.43) 

15.16,45.07 

(2.44,2.56) 

15.16,45.10 

(2.45,2.54) 

6 

5.79,43.01 

(2.48,2.61) 

5.83,42.99 

(2.50,2.62) 

19.50,50.99 

(2.45,2.60) 

19.49,50.90 

(2.44,2.64) 

9 

4.12,42.85 

(2.52,2.59) 

4.12,42.75 

(2.52,2.65) 

21.69,53.51 

(2.43,2.62) 

21.68,53.43 

(2.42,2.65) 

00 

0.13,41.73 

(2.55,2.55) 

0.22,41.65 

(2.59,2.59) 

27.59,59.29 

(2.44,2.59) 

27.58,59.24 

(2.43,2.62) 

The  true  conditional  one-sided  significance  levels  of  the  interval 
endpoints  are  shown  as  percentages  in  parentheses. 
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